PCOS Analysis!!

Agnes Lorenzen, Cecille Hobbs, Freja E. Klippmann, Julie Dalgaard Petersen & Mille Rask Sander

Introduction

Background

  • Polycystic ovary syndrome (PCOS) is a syndrome documented in women in their menstruating ages

  • Documented symptoms are often; period pains, irregular periods, ovary related problems and hormone imbalance

  • Patients with PCOS often have problems with pregnancy and potential complication with/in pregnancy

  • However, it is still not verified what the cause of PCOS is.

Aim

The aim of this study is to examine a data set (found on Kaggle) of patients with and without PCOS. The data set has been made in India and data comes from 10 different hospitals.

Data handling approach

  • Raw data:
    541 observations divided into 45 variables

  • 01_load_data:
    Simply loads the data

  • 02_clean_data:

    • Fixing random cells and replacing them with NA
    • Rename & factorizing columns
    • Split dataframe into body and blood measurements
    • Removed empty column
  • 03_augment:

    • Unit changes ( inch to cm)
    • Rounding & grouping BMI
    • Change Blood type and cycles from numeric values to characters
    • Create new column for cycle/ pregnancy stage
    • Merging data frame into one file

# Rounding of BMI and dividing into categories
body_measurements <- body_measurements |>
  mutate(BMI = round(BMI, 1)) |> 
  mutate(BMI_class = case_when(
    BMI < 18.5 ~ "Underweight",
    BMI <= 18.5 | BMI < 25 ~ "Normal weight",
    BMI <= 25 | BMI < 30 ~ "Overweight",
    BMI >= 30 ~ "Obesity")) |>
  mutate(BMI_class = factor(BMI_class,
                            levels =  c("Underweight", 
                                        "Normal weight",
                                        "Overweight", 
                                        "Obesity"))) |>
  relocate(BMI_class, .after = BMI)

Descriptive analysis of data

Dimensions:

# A tibble: 2 × 1
  `PCOS dimensions`
              <int>
1               541
2                44

Count of how many have PCOS:

# A tibble: 2 × 2
  PCOS_diagnosis     n
  <chr>          <int>
1 No               364
2 Yes              177

Body measurement - Follicle number

Follicle number and PCOS diagnosis:

Blood measurement data analysis

PCA of blood measurements

No diverging of PCOS diagnosed individuals compared to non-PCOS diagnosed individuals.

PCA of body measurements

Slight divergance of PCOS and non-PCOS in body measurements.

Discussion

  • Distribution between women diagnosed with and without PCOS

  • PCA plots show few variables show relevance to diagnose PCOS (FSH & LH)

  • Body parameters show little clustering (BMI)

Conclusion

  • Not an optimal data set for significant conclutions